arXiv:1502.05457vl [math.ST] 19 Feb 2015 


A note on an Adaptive Goodness-of-Fit test with Finite Sample 
Validity for Random Design Regression Models 


Pierpaolo Bruttf 

Department of Statistics, Sapienza University of Rome 


Abstract 

Given an i.i.d. sample „j from the random design regression model Y = /(X) + e with (X, Y) e 

[0,1] X in this paper we consider the problem of testing the (simple) null hypothesis "/ = /o", against 

the alternative "/ fy" for a fixed /g e ([0,1], Gx), where Gx(-) denotes the marginal distribution of the design 
variable X. The procedure proposed is an adaptation to the regression setting of a multiple testing technique 
introduced by Fromont and Laurent 1^ . and it amounts to consider a suitable collection of unbiased estimators 
of the L^-distance d 2 (/,/o) = f [f{x) -/o(x)]^dGx(x), rejecting the null hypothesis when at least one of them is 
greater than its (1 - m^) quantile, with Ua calibrated to obtain a level-a test. To build these estimators, we will 
use the warped wavelet basis introduced by Picard and Kerkyacharian ISTI . We do not assume that the errors are 
normally distributed, and we do not assume that X and £ are independent but, mainly for technical reasons, we 
will assume, as in most part of the current literature in learning theory, that |/(x) - y| is uniformly bounded (almost 
everywhere). We show that our test is adaptive over a particular collection of approximation spaces linked to the 
classical Besov spaces. 

Keywords: Nonparametric Regression; Random Design; Goodness-of-fit; Adaptive test; Separation Rates; Warped 
Wavelets; U-statistics; Multiple Test. 


1 Introduction 

Consider the usual nonparametric regression problem with random design. In this model we observe an 
i.i.d. sample 'Dn = {Z, = (X,-, Y,)},gji „) from the distribution of a vector Z = (X, Y) where 

Y = /(X)+£, 

for (X, c) a random vector with ]E(£|X) = 0 and ]E(£^|X) < oo almost surely. The regression function is 
known to belong to a subset T" of L^([0,1], Gx) for Gx the marginal distribution of X. Let /o € (F be fixed. 
In this paper we consider the problem of testing the (simple) null hypothesis "Hq : / = /o" against the 
alternative "Hi : / /o". Since / e L^([0, l],Gx), it seems natural to consider a test statistic somehow 

linked to an estimator of the (weighted) L^-distance d 2 (/,/o) = f [/(x) -/o(x)]^dGx(x). The approach 
considered in the present paper is an adaptation to the regression setting with random design of the 
work by Fromont and Laurent Il22l for density models, and it amounts to consider a suitable collection of 
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unbiased estimators for d 2 (/,/o), rejecting the null hypothesis when at least one of them is greater than 
its (l-Ua) quantile, with Ua calibrated to obtain a level-a test. 

After Ingster's seminal paper ||40| , and Hart's influential book 1351 , many authors have been concerned 
with the construction of nonparametric tests on the unknown function that appears in a regression or 
Gaussian white noise model. In papers like fl8l , Ii42l . 1331, or more recently Hardle and Kneip 1321 , 
Lepski and Spokorny ESI, Lepski and Tsybakov 15^ . and Gayraund and Pouet l25l . the authors tackle 
the non-adaptive case by specifying a particular functional/smoothness class to which /(•) belongs, and 
then evaluating the minimal separation/distance between the null hypothesis and the set of alternatives 
for which testing with a prescribed error probability is still possible. Hard-coding the smoothness class of 
choice into any statistical procedure is clearly impractical and unattractive. For this reason, much of the 
effort has then be dedicated to explore the adaptive, case where the smoothness level is also supposed to be 
unknown. So, for example, in 12^ and l37l Gayraund and Pouet on one side and Horowitz and Spokoiny 
on the other deal with the adaptive case for a composite null hypothesis and suitable smoothness classes 
(e.g. Holder spaces), whereas Fromont and Levy-Leduc in 123 ^^1 consider the problem of periodic signal 
detection in a Gaussian fixed design regression framework, when the signal belongs to some periodic 
Sobolev balls. Fan, Zhang and Zhang 1 201 and Fan and Zhang Il211l . using a generalized likelihood 
ratio, give adaptive results when the alternatives lie in a range of Sobolev ball, also highlighting a 
nonparametric counterpart of the so called Wilks phenomenon well known in parametric inference. In 
Il63l and Il6^ Spokoiny considers testing a simple hypothesis under a Gaussian white noise model over 
an appropriate collection of Besov balls. Quite relevant is also the work of Baraud, Huet and Laurent [4] 
where the assumption on /(•) are reduced to a minimum thanks to the adoption of a discrete distance that 
approximate the usual L^-norm to measure separation between the null and the alternative hypothesis. 

Similar problems have been widely studied in the testing literature. To briefly summarize the basic 
notions and notation regarding hypothesis testing, consider the following general setting where we have 
an observation Y coming from a distribution Gy(-), and we are interested in testing the (composite) null 
hypothesis Hq : Gy e %, where % denotes a families of probability measures, against the alternative 
Hi : Gy ^ To accomplish this task, we need to define a test function T,j(Y); that is, a measurable 
function of Y that takes values in {0,1}, such that, given a testing level a € (0,1), we reject Hq every time 
Ta(Y) = 1. The value a is the testing level of our procedure in the sense that we require 

sup PG{Ta(y) = ll < a. 

Ge% 

For each G ^ %, the Type II error of our testing procedure on G(-), is defined by 

/^(G,Ta(Y)) = PclTaiY) = oj, 
whereas the power of the test on G(-) is given by 

7z(G,T«(r)) = i-|g(G,T«(r)). 

Of course, an easy way to choose a testing method would be to select the most powerful one (i.e. the one 
with smaller Type II error) within the class of level-a tests. In general, the closer G ^ % is to %, the more 
difficult is to separate the null from the alternative hypothesis, and consequently, the smaller is the power 
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of a test on that particular G(-). This obvious fact naturally leads to define the notion of separation rate of 
a test Ta(Y) over a functional class with respect to a distance d(-), as follow 

p (Ta(Y),^,|g) = infip > 0 : sup Pc|Ta(Y) = ol < jS 

[ Ge^:d(G,%)>p 

In words, p (T^ (Y), |S) is the minimal distance from % starting from which our testing procedure has a 

Type II error smaller than uniformly over From here, we can immediately define the (non-asymptotic) 
(n,|S) minimax rate of testing over the class as follow 

ird p(T,^,^), 

TeTa 

where Ta denotes the class of test statistics that are associated to n-level tests. 

If we have a complete characterization of the class in general we are able to build a testing procedure 
that explicitly depends on 1#, and attains the minimax separation rate over itself. However, as we already 
said, it is extremely unsatisfying to have ^ hard-coded in our technique. A more interesting task, in fact, 
would be to build adaptive testing methods that simultaneously (nearly) attain the minimax separation 
rate over a broad range of reasonable classes without using any prior knowledge about the law of the 
observations. Eubank and Hart lfT9l propose to test that a regression function is identically zero using a test 
function based on the Mallow's Cp penalty. Antoniadis, Gijbels and Gregoire |1J, once again in a regression 
setting, develop an automatic model selection procedure that they also apply to the same testing problem. 
Spokoiny 1631, instead, considers a Gaussian white noise model dX(f) = f{t)dt + £dW(f), and propose 
to test "/ = 0" adaptively using a wavelet based procedure. He also study the (asymptotic) properties of 
his approach and show that, in general, adaptation is not possible without some loss of efficiency of the 
order of an extra loglog(n) factor, where n is the sample size (see SectionIn the same setting, Ingster 
Il43l builds an adaptive test based on chi-square statistics, and study its asymptotic properties. 

Many authors have also considered the problem of testing convex or qualitative hypothesis like the 
monotonicity of the regression function: Bowman, Jones and Gijbels |8|; Hall and Heckman USTI : Gijbels, 
Hall, Jones and Koch 1(281 : Ghosal, Sen and van der Vaart 1271 , are just a few examples. In IITtI . instead, 
Diimbgen and Spokoiny consider the problem of testing the positivity, monotonicity and convexity of the 
function /(•) that appears in a Gaussian white noise model. They also evaluate the separation rates of 
their procedures showing in this way their optimality. See also Juditsky and Nemirovski Il47l . 

The literature regarding goodness-of-fit testing in a density model is also vast. Bickel and Ritov 
H; Ledwina EH; Kallenberg and Ledwina I49l : Inglot and Ledwina 1391 : Kallenberg 1481 , for instance, 
propose tests inspired by Neyman 1601 where the parameter that enter the definition of the test statistic 
(in general a smoothing parameter) is estimated by some automatic data dependent criterion like BIG. In 
general, only the asymptotic performances of these tests have been studied in some detail. 

The pre-testing approach considered in the paper by Fromont and Laurent 1221 has been initiated 
by Baraud, Huet and Laurent @ IH El for the problem of testing linear or qualitative hypotheses in the 
Gaussian regression model. One nice feature of their approach is that the properties of the procedures 
are non asymptotic. For any given sample size n, the tests have the desired level and we are able to 
characterize a set of alternatives over which they have a prescribed power. It is interesting to notice that 
the method proposed by Fromont and Laurent to build a test function essentially amounts to penalize by 
the appropriate quantile under the null, an unbiased estimator of projections of the L^-distance between 
densities. Other papers where U-statistics have been used to build test functions are: lT^l5^l58llT0l . 
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This paper is organized as follow. In Section we describe the testing procedure. In Section 2.1 
we review the concept of warped wavelet basis proposed in and we establish the type of alternatives 


against which our test has a guaranteed power. Then, in Section [2^ together with a brief simulation study, 
we show that our procedure is adaptive over some collection of warped Besov spaces in the sense that 
it achieves the optimal "adaptive" rate of testing over all the members of this collection simultaneously. 
Finally Section [^contains the proofs of the results presented. 


2 A Goodness-of-Fit Test 


As anticipated in the previous section, the framework we shall work with in this paper is the usual 
nonparametric regression problem with random design. In this model we observe an i.i.d. sample 
(Dfj = {Z; = (X/, Yi)}ie{i...n] from the distribution of a vector Z = (X, Y) described structurally as 

Y = /(X)+£, 

for (X, c) a random vector with E(£|X) = 0 and E(£^|X) < oo almost surely. The regression function is 
known to belong to a subset T' of L^([0, 1],Gx) for Gx the marginal distribution of X, which is assumed 
known. As explained in ICTI , the assumption on Gx is surely unpleasant but unavoidable: the radius of 
the confidence set will be inflated in varying amount depending on the conditions imposed on Gx so we 
postpone the treatment of this case to a forthcoming paper. The variance function a^(x) = E(£^|X = x) 
need not to be known, although a known upper bound on Ha^lU is needed. We do not assume that the 
errors are normally distributed, and we do not assume that X and e are independent but, mainly for 
technical reasons, we will assume, as in most part of the current literature in learning theory (see HH), that 
\f{x) - y\ is uniformly bounded (almost ever}rwhere) by a positive constant M. Doing so, all the proofs 
will be greatly simplified without moving too far away from a realistic (although surely not minimal) set 
of assumptions (in particular considering the finite-sample scope of the analysis). Clearly this condition 
overrules the one on the conditional variance mentioned before. 

As it is often the case in nonparametric statistics, we could cast this example into a problem of 
estimating a sequence 6 = [0i, 02 / ■ • ■] e of parameters by expanding /(•) on a fixed orthonormal basis 
of L^([0,1], Gx). The Fourier coefficients take the form 




and they can be estimated unbiasedly by = \ L)Li although it appears not so useful to move 

directly in sequence space by considering [Wi, W 2 ,...] as the observation vector. What we propose is a 
goodness-of-fit test similar to the one introduced in If22l . To describe it, let /o(-) be some fixed function 
in L^( [0,1], Gx) and a e (0,1). Now suppose that our goal is to build a level-n test of the null hypothesis 
Ho : / = /o against the alternative Hi : / /o from the data {ZJ/gji „). The test is based on an estimation 
of 

= ll^l&(Gx) + ll/ol&(Gx) 


Since the last (linear) term (/, /o)l2(Gx) t)e easily estimated by the empirical estimator ^ L/Li Yi fo (X/), 

the key problem is the estimation of the first term II/IP 2 / Adapting the arguments in we can 

L (Gx) 
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consider an at most countable collection of linear subspaces of ([0,1], Gx) denoted by ^ = {SjtljteK- For 
all k eK, let be some orthonormal basis of The estimator 


n n-1 


9n,k 




n(n - 1) ^ 

^ ^ !=2 ;=1 ytel, 




n n-1 




Is hi 


( 1 ) 


is a U-statistic of order two for ||ns^(/)|p 2 / ~ where ns^(-) denotes the orthogonal projection onto Sjt - 

L (Gx) 


with kernel 


h{zi,Z2) = \yiee{xi)] ■ [yietixi)], z; = {xi,yi), i e {1,2}. 
t^ik 


Then, for any /c e K, ||/ - oan be estimated by 

L (Gx) 


2 

(Gx) n 


Ra = en,k + mUc.--Y,^ih{^i) = 


i=l 


(«) 


u,„+2(p„ - p)(ns. (/) - /) - ims. (/) - 

+2(F,-P)(/-/„) + ||/-/„|f,|^^|, 


where, 


n n-1 


Unk — 


' fy 

n(n - 1) 4-ri ■— 
' ' 1=2 1=1 


=2 j=l [iel, 


J^[YMXi)-9e}-[Yjef{Xj 


( 2 ) 


^l:g»(z„z,). (3) 

’ 1=2 ]=1 


and, for each zv e L^, 


1 ^ 

Fn{zv) = -J^Yiw{Xi) = and P(w) = I f{x)w{x)dGx{x) = 

^ i=i 

so that 

E(x,y) {IPn(«^)} = ^^E(x,y){Tw(X)} = EGx{/(X)a;(X)} = Jf(x)w(x)dGx(x) = P(w). 

The equality (❖) can be derived from the Hoejfding decomposition of 0„ as explained in Section]^ 

Now that we have an estimator lets denote by its 1 - w quantile under Hq, and consider 


= sup |m € (0,1) : P 


(gin 

fo 


snp[R„^k-f'n,k{u)] > 0 


IkeK 


< a 


where P®”{-} is the law of the observations {Z/|;gji „) under the the null hypothesis. Then introduce the 
test statistics defined by 

Rft = SUp(R„,;t-N,Jc(Ma)l 

keK 


5 












so that we reject the null whenever is positive. 

This method, adapted to the regression setting from Il22l . amounts to a multiple testing procedure. 
Indeed, for all k eK, we construct a level-M^ test by rejecting Hq : / = /o if Rn,jc is greater than its (1 - u^) 
quantile under Hq. After this, we are left with a collection of tests and we decide to reject Hq if, for 
some of the tests in the collection, the hypothesis is rejected. In practice, the value of «« and the quantile 
{^n,k{^a)}keK ^re to be estimated by (smoothed) bootstrap (see Il36ll^ ). 

2.1 Power of the Test 

Both the practical and theoretical performances of the proposed test, depend strongly on the orthogo¬ 
nal system we adopt to generate the collection of linear subspaces {Sk}keK- In dealing with a density 
model, Fromont and Laurent Il22l . consider a collection obtained by mixing spaces generated by constants 
piecewise functions (Haar basis), scaling functions from a wavelet basis, and, in the case of compactly 
supported densities, trigonometric polynomial. Clearly these bases are not orthonormal in our weighed 
space ([0,1], Gx), hence we have to consider other options. 

The first possibility that comes to mind is to use one of the usual wavelet bases since, as proved by 
Haroske and Triebel in IMl (see also 1241 and 152| ), these systems continue to be unconditional Schauder 
bases for a whole family of weighted Besov spaces once we put some polynomial restriction on the growth 
of the weight function. 

Although appealing, this approach has some evident drawbacks once applied to our setting from a 
theoretical (we must impose some counterintuitive conditions on the marginal Gx(-))/ well as practical 
(we can not use the well-known/ast wavelet transform anymore, see Q) point of view. 

As one can see looking at the proofs of Section a basis that proved to fit perfectly in the present 
framework, is the so-called warped wavelet basis studied by Kerkyacharian and Picard in ISTl l50l . The 
idea is as follow. For a signal observed at some design points, T(f;), i e {!,...,2l}, if the design is 
regular {t]^ = k/l^), the standard wavelet decomposition algorithm starts with sp = 2h^Y(A:/2l) which 
approximates the scaling coefficient J Y(z)(/)p(z)dx, with (/)p(v) = 2^^^(p{2^x -k) and (p{-) the so-called 
scaling function or father wavelet (see 1571 for further information). Then the cascade algorithm is 
employed to obtain the wavelet coefficients dy for j < J, which in turn are thresholded. If the design is 
not regular, and we still employ the same algorithm, then for a function H(-) such that H{k/2l) = t^, we 
have Sp = 2l^^Y{H{k/2^)). Essentially what we are doing is to decompose, with respect to a standard 
wavelet basis, the function Y(H(x)) or, if G o H(x) = x, the original function Y(x) itself but with respect to 
a new warped basis {i/'p(G(-))|(p). 

In the regression setting, this means replacing the standard wavelet expansion of the function / (•) by its 
expansion on the new basis {i/'p(G(-))}(p), where G(-) is adapting to the design: it maybe the distribution 
function of the design Gx{-), or its estimation when it is unknown (not our case). An appealing feature of 
this method is that it does not need a new algorithm to be implemented: just standard and widespread 
tools (we will use this nice feature of the warped bases in the companion paper [9j). 

It is important to notice that a warped wavelet basis is, automatically, an orthonormal system in 
([0,1 ], Gx). In fact, if, for easy of notation, we index the basis functions by mean of the set ^ = ^([0,1]) 
of dyadic cubes of R contained in [0,1], i.e. we set i/'p(-) = i/'i (•)/ then for each li, I 2 in we have 

oGx,i/'i 2 oGx)^ 2 ^^^^ = f ^k(Gx(x))4>,,(Gx(x))dGx(x) = J i/^ij(y)i/^i,(i/)di/ = 
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where the last equality comes from the fact that we can build our warped basis from a (boundary corrected) 
wavelet system, orthonormal with respect to the Lebesgue measure in [0,1] (see HHIllI, and Chapter 7.5 
in |5Z1). 

Now, to extract a basis out of a warped system, we surely need to impose restrictions on the design 
distribution Gx(-)- As a matter of fact, it is easy to prove that the orthonormal system {ip\{G)}\e^ or, 
equivalently, the system k{])] scaling functions at any fixed resolution level J, is total in 

L^([0,1], Gx) if Gx(-) is absolutely continuous with density gx{-) - with respect to the Lebesgue measure 
- bounded from below and above. Of course, this condition is only sufficient and unnecessarily stringent 
but also simple enough to fit perfectly our desiderata. 

At this point, for each J, we have a system of scaling functions {(/)p(G)}jt orthonormal in L^([0,1], Gx) 
that we can use to generate the subspaces S = {Sj}jg]N where we have slightly changed the indexing 
notation: from k to J. So let 


Sj = spanj{(/)p(G)})tez) with J e jo, ..., J(n)) = J'„ 


and 


n n-1 


n(n - 1) ^ ^ 




hJ) 


J^[Yicp^,k{GiX,))}-[YjcPj,k{G{Xj))] 


k=l 


n n-1 


^ ' 1=2 ;=1 


For all J € ^n, we set 


R«J — Qn,] + II/oIIl2(Gx) n 


i=l 


The test statistic we consider is 

Ha = sup {R„j - r„j{ua)], (4) 

JeJ'„ 

where r„^^{ua) is defined in Section]^ 

The following theorem, which mimics Theorem 1 in [122| . describes the class of alternatives over which 
the test has a prescribed power. The proof can be found in Section]^ 


Theorem 2.1 Let {Zi = (X/, y;)},gji^ feean z./.d. sequence from the distribution of a vector Z = {X,Y) described 
structurally by the nonparametric regression model 


y = /(x) + £, 

for {X,e) a random vector with E(£|X) = 0 and E(£^|X) < +oo. Assume further that /o(-) and the unknown 
regression function /(•) belong to L^([0, l],Gx) for Gx(-) the marginal distribution of X, assumed know and 
absolutely continuous with density gx(-) bounded from below and above. Finally assume that \f{x) - y\ is uniformly 
bounded (almost everywhere) by a positive constant M. 

Nowletf) € (0,1). For ally e (0,2), there exist positive constants Ci = C\{^) andC 2 = C 2 (|S,y, Too,M, H/olU) 
such that, defining 

n [ n ] n 


1 






with Too = WfWlo + Ik^lU, then, for every /(•) such that 




the following inequality holds: 

Pf"[Ra < o) < iS. 


2.2 Uniform Separation Rates 

Now that we know against what kind of alternatives our multiple testing procedure has guaranteed 
power, we can move on, and examine the problem of establishing uniform separation rates over well- 
suited functional classes included in L^([0,l],Gx). We will start by defining for all s > 0,R > 0,andM > 0, 
the following (linear) approximation space (see the review by DeVore HU): 

R^{R,M,Gx) = jre€ L2([0,1 ],Gx) : lla^lU <M, and \\w-ns,{w)\\l,^^^^ < (5) 

When dGx(^) = dx is the Lebesgue measure, Sl^{R,M, dx) is strictly related to the following Besov body 

(R) = I w e (dx, R) : I, with dp = (w, 4>hk)^2r^. / 

[ keZ ' j 


since 

‘B^fR) n {w : llreiu < M} c dx). 

In our case, instead, it is a bit less clear how to "visualize" the content of (R, M, Gx) in terms of common 
smoothness classes like Besov, Holder or Sobolev body that admit alternative definitions in terms of 
geometric quantities like the modulus of smoothness (see Hll). The easiest way, probably, is to notice that, 
for each we ([0,1], Gx) 


k-ns,NirL2(G^) 


ze(G-i)-ns,(re(G-i)) 


2 

G(dx) ' 


where the norm in the right hand side is taken with respect to the Lebesgue measure and 

G^^(x) = inf{f e R : Gx{t) > x} 


is the quantile function of the design distribution Gx(-)- Consequently, 

/ € A^(R,M,Gx) o /(G-1) e A%R,M,dx) d ®^^(rV 1-4-«) n {/ : ||/|U < M}, 


so that the regularity conditions that hide behind the definition of the approximation space A^(R,M, Gx) 
could be expressed more explicitly in terms of the warped function / o G^^(-), mixing the smoothness of 
/(•) with the (very regular, indeed) design Gx(-) (see ||5TJ|50l for further information and discussions). 

The following corollary gives upper bounds for the uniform separation rates of our procedure over 
the class J^^{R,M, Gx). 
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Corollary 2.2 Let be the test statistic defined in Equation Assume that n > 16, and ] e ff„ = {0,... ,]{n)} 
with 

2J(«) — 

[loglog(w)]3' 

Let f e (0,1). For all s > 0, M > 0, and R > 0, there exist some positive constant C = C(s,a,|S,M, ||/olU) such 
that if f e A^(R,M, Gx) and satisfies 


II/-/ 0 IIJ. 


(Gx) 


> c 


IR4S+1 


Vloglog(r 


4s 

4s+l 


+ R^ 


(loglog(n))3 


2s 


+ 


log log(n) 


then 

Pf’^[Ra 

In particular, ifR e |^R, r| with 

• R = [loglog(n)]® 

• R = -^ 

[loglog(n)]^"+2 

then there exists some positive constant C' = C'(s,a,j3,M, H/IU) such that the uniform separation rate of the test 
l(o,+oc) {Fla) over A^{R,M, Gx) satisfies 


p(l(0,+oo)(Ra),-^^(R,M,Gx),iS) < C' R4m 


Vloglog(n) 


2s 

2s+l 


Remark: 

• The separation rate for the problem of testing "f = 0" in the classical Gaussian white noise model 
dX(f) = f{t)dt + £dW(f) has been evaluated for different smoothness classes and distances by 
Ingster |42l, Ermakov IllSlI . Lepsky and Spokoiny |55l, Ingster and Suslina Il45ll (see also the mono¬ 
graph IML and 1561 were Lepski and Tsybakov established the asymptotic separation rate - with 
constants - for the L”-norm). In |2], instead, Baraud was able to obtain non-asymptotic bounds 
on the minimax separation rate in the case of a Gaussian regression model. From Ingster Bn . we 
know that the minimax rate of testing ove r Holderian balls ‘7T®(R) in a Gaussian white noise model 
is equal to )7-2s/(i+4s) pj-om Gorollary |2.2| it seems that we loose a factor equal to (loglog(n))®/(^+4s) 
when s > I but, as Spokoiny proved in Il63l (see also 03), adaptivity costs necessarily a logarithmic 
factor. Therefore we deduce that for R e [r, r|, our procedure adapts over the approximation space 
A^{R,M, Gx) at a rate known to be optimal for a particular scale of Besov spaces. 


A 
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2.3 Simulation Study 

In this section we carry a brief simulation study to evaluate the performances of the proposed testing 
procedure. Figure [^summarizes the setup. We consider noisy versions of Donoho's Heavy Sine function 
|il^ corresponding to different signal to noise ratios ranging from 10 to 20, and three different design 
distributions that we call Type I, II and III. The sample size is fixed and equal to 512, whereas we choose to 
take C3X(ii{J'n) = 50. Given the nature of the Heavy Sine function, we focus on alternatives of the type 


Donoho’s heavy sine 



Design Type II n = 512 :: s/n = 15 



Design Type I :: n = 512 s/n = 20 



Design Type Hi :: n = 512 :: s/n = 10 

6 I-^^-- 



-6 


_8l-,-:-:-,-1 

0 0.2 0.4 0.6 0.8 1 

X 


Figure 1: The heavy sine function together with reaiizations from the three designs chosen to perform the simuia- 
tion study. With s/n we have denoted the signal to noise ratio. 


h{x\K) = Ksin(47zx). 

Notice that the true regression function was generated by modifying h{x\A). Finally we set M = 10. 

As in Ii22]| . we have chosen a level a = 0.05. The value of Ua and the quantiles {rn,j{ua)}]ej„ are 
estimated by 50000 simulations using a (smoothed) bootstrap procedure. We use 25000 simulations for 
the estimation of the (1 - w) quantiles of the variables 

R„j = e„,, + -1 Y,/„{x,), 
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Type 1 

K = 2 

K = 4 

K = 6 

Estim. Lev. 

0.80 

0.77 

0.84 

0.049 

Type II 

K = 2 

K = 4 

K = 6 

Estim. Lev. 

0.58 

0.55 

0.60 

0.053 

Type III 

K = 2 

K = 4 

K = 6 

Estim. Lev. 

0.44 

0.37 

0.43 

0.052 


Table 1: Estimated Power and Level for the test in Section |Z2] 


under the hypothesis "/ = /o" for u varying on a grid of (0,a), and 25000 simulations for the estimation 
of the probabilities 


OiSin 

^/o ■ 


sup 

]eJn 


Bn., + li/oiiycri -Z >■»,](> 

i 

Table [^presents the results of our simulation study 


>0 


3 Discussion 

In this short section we collect some remarks regarding the content of this chapter. First of all, it is almost 
inevitable to mention the most evident weakness of the proposed approach, i.e. the fact that we assumed 
the design distribution Gx(-) to be completely specified. Although there is a vast literature on the so called 
designed experiments where this type of assumptions are truly welcome, in the present nonparametric 
regression setting it seems desirable to get rid of it, the most natural way being to assume that Gx(-) 
belongs to some suitable smoothness class. Clearly this class should necessarily be "small" enough so 
that we are still able to prove the analogs of Theorem |2 .1 1 and Corollary |2.2[ Notice also that an additional 
complication we encounter assuming Gx(-) (partially) unknown comes from the fact that now we need 
to warp the initial wavelet basis with some - possibly smoothed - version of the empirical distribution 
function. See the paper IISOll by Picard and Kerkyacharian to have an idea of the intrinsic difficulty of the 
problem. 

Although it is not as disturbing as the previous one, another hypothesis that we might want to relax is 
the one that requires the knowledge of a (uniform) bound over |y - f{x)\. A possible way out here seems 
to be the use of arguments similar to those adopted by Laurent in 15311 to prove her Proposition 2. 

Finally, just a word on the simulation study carried in SectionOf course this is only a very brief - 
although promising - analysis that can be extended in many directions by considering, for instance, other 
families of alternatives and regression functions, and possibly a suitable ranges of sample sizes. 
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4 Proofs for Section |2] 

4.1 Proof of Theorem I2.ll 

Lets start proving the equality (O) in Equation (2 i. is aU-statistic of order two for ||ns^(/)|p 2 with 


kernel 


Hence, 


LhGx) 


h{zi,Z2) = \yiee{xi)\ ■ [yiet{x2)\, z; = {xi,yi), i e {1,2}. 


tel. 


n i-1 


9n,k — 


n,{h) + ^Y^nm{h){Zi) + g ^ nim (J,,) (z,-, Z;), 


where, 
Hg (Efc) 


i=l 


E[h{Zi,Zj)]"='^' L E(Y;eKXO)-E|YyeKXy))'^-=‘''- L (E|Ye,(X)}) 


id. distr. 


{el, 


{el, 


- I el 


{el. 


= E [h{Zi,Zj)\Zi] - E{h{Zi,Zj)] = E ([yiet{xi)\■ E{Yjet{Xj)} - 0^ 

{el, 

= E [[yie{{xi)}-ee-efj= E ei[yiee{xi)-e{]. 


{el, {el, 

n,i, 2 )(/j)c)(Z;,Zy) = e| 12 ^(Z,-,2^)12,vZy) -E(%(Z,-,2^)12,■) -Ej%(Z,-,Zy)|Zy) + E(l 2 fc(Z;,Zy)) = 

= E [{yie({xi)] ■ {yjeiixj)] - ee{yie({xi)} - ee{yje({xj)] + 0 ^) = 


= E [yiee{xi)-e{\-[yjee{xj)-e{]. 


Hence 


tel, 

E 

tel, 

2 


=E +- E E BeiyMx,) - 0 ,]+E E - ®'l • - 

fej*. i tel, ^ ^ i^j {el. 

Now note the following equivalences implied by the orthonormality of the system in L^([0,1], Gx) 

|2 




E e^ee 

tel, 


L'(Gx) 


orthonorm. _ 2 

E 0 r* 

{el, 


2 (p„ - P) (ns, (/)) = 2 {1 E r, ns.(/) (x,) - (ns, (/), /)^. 

I|ns.(/)||{= 


= 2ii£r,- 

! = 1 


(Gx) 


E e^ef(xo 

tel. 


= 2UI 

! = 1 


= 2UE 


! = 1 


E OeYietiXi) 

tel, 

E 0 fY, eKXO 

tel. 


(Gx) 


E en = 

tel, 
n 


-1 E 

n Z—i 


Z=1 


E 05 


^eJr 


So 


= 2 E E 04 Y;e,(XO- 04 . 

/=! tel, 

9n,k = ||ns,(/)|rL2(G,) + 2 (P. - P) (ns,(/)) + Un,,. 
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Finally, by using the fact that 

ll/-ns.(/)l&(G.) 


from Equation Q we obtain 


l=(G,, + lins. (/)!&,c,,-2lins.(/)lt„^,= 
-lins.(/)ll? 


L (Gx) 


'L^(Gx)' 


Ra = 0a + ll/ollL2(c^)--= 


= + 2 (P„ - P) (ns,(/) ± /) - [±II/II^(G,) - lins.(/)ll^.(Cx)]} + ll^ollp(Gx) - = 

I„,, + 2 (P„ - P) (ns,(/) - /) - 11/ - + 

+ll/olf^(Gx) + \\f\\l.^^^^^±2P{f)+2{F„ - P)(/) -2P„(/o) = 


= iu 


(•) + (ll/oll 


2 

L/Gx) 


^'l2(Gx) 


■ 2P(/)) + [2{Fn - P) (/) - 2{F„ - P)(/o)) 


Un,k + 2 (P„ - P) (ns, (/) - /) - 11/ - ns, (/)llj 2 (G,) + 2(P„ - P)(/ - /o) + 11/ - /olf: 


-(GxY 


and this complete the proof. 

Now, given |S € (0,1) we know that 


Pf {Ra < 0} = Pj" sup [0„j + II/0 |Pl^(Gx) “ « Tjr 

I n 




hence 


P^"{R«<0} < + 


= jll {^-2 + 2 (Pn - P) (ns, (/) - /) -11/ - ns, (/)II^(G,)+ 
+ 2(P„ - P)(/-/o) + ll/-/olf 2 (G^) -Nj(Ma) < o}. 


( 6 ) 


Following Il22ll . we will split the control of the power in three steps, involving separately Lf„j, 2(P„ - 
P)(ns, (/) - /), and 2(P„ - P)(/ - /o). To handle the last two terms, we will use the following version of 
the Bernstein's inequality with constants provided by Birge and Massart in |j7|: 

Lemma 4.1 Let {Ui]ie{i,...,n] be independent random variables such that for all i e {1,... ,n} 

• mi < b, 

• E(nf) < 52 
Then, for all u > 0, 




i=l 


6 bi 

yfn 3n 


— 1 <e“ 


(7) 
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4.1.1 • Control of LI, 


n,] 


We start with the following lemma whose proof is postponed to Section 4.3 


Lemma 4.2 Under the hypotheses and using the notation ofTheorem \2.1\ there exists an absolute constant kq such 
that, for all u > 0,we have 


/ 


""Linj|>- 
' n 


I + TooM +M^ 


<5.6e““. 


Now, let u\ = u\{f) = log(3/ f), and u\\ = u\\{f) = u\ + log(5.6). Then, from Equation (j^, we obtain 

.ul2^ 




^Ju\\2l + TooU\\ +M" 




where Too = ||/||L + \\o^ 


4.1.2 • Control of 2(P„ - P)(nsj (/) - /) 

In order to apply Lemma [4lj let 

LJ = 2Y[ns,(/)(X)-/(X)], 

then 


\U\ = 


2Y 


[ns,(/)(X)-/(X)]U2M| sup |ns,(/)(x)|- sup |/(x)|l =4M||/||o 

[xe[0,l] x6[0,l] j 


e(lj 2)^ E(4r2(ns,(/)-/f (x)}^ 

• = 4E ([/2(X) + a2(X)] (ns, (/) - ff (X)} < TtooE ((Hs, (/) - ff (X)} 

Hence, applying Lemma |4T[ we have 


p«p(p„-P)(ns,{/)-/)< - ^'^^ iins,(/)-/ii,.(o,) 


3n 


< e““. 


By the inequality 2ab < + jb^ we then have 


2Too^ 


n 




(Gx)' 


and consequently 


p“ 2 (p„ - P) (ns, (/) - /) + Jms, (/) - /||2 


HGx) 


< - 


Finally, taking U] = u\{f) = log(3/ f), as before, we get 


p“ 2 (p„ - P) (ns, (/) - /) + y ms, (/) - fill 


~(Gx) 


< 


8 4 

-TooT-MII/IIo 
7 3 


8 4 

-TooT-MII/IIo 
7 3 


-1 <e“ 




( 8 ) 


(9) 


( 10 ) 
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4.1.3 • Control of 2(P„ - P)(/ - /o) 
Proceeding as in the previous section, let 


LJ = 2r[/(X)-/o(X)], 


then 


\U\ - |2y [/(X) -/o(X)]| < 2 m(ii/iu + ll/olloo), 

e(u 2)= E(4y2(/-/of(X)) = 

= 4E j[/2(X) +a2(X)] if-fof (X)) < 4t»e((/-/o)' (X)) = 


= 4to 


l/-/olt 


(Gx) 


Hence, applying Lemma 4.1 we have 

P*» |2(P, - P)(/-/„) <11/- 


Applying again the inequality lab < + jb^ we then have 


3n 


< e 


2Tco^ 


n 




« 1 —+ jiiz-ZoiiLc,, 

y n 4 L (Gx) 


and consequently 


P, 


< - 


P;”|2(P„-P) (/-/o) + L||/-/o||2,^^^^ 
Finally, taking M| = W|(|S) = log(3/|S), as always, we get 


^;”j2(P„-P) (/-/o) + |||/-/o|f. 


(Gx) 


< - 




^Too + 


+ ll/olloo} 


+ II/ 0 IIOO) 


“Ue-“ 


n 3 


4.1.4 • Conclusion 

Combining Equation Q with the bounds presented in Equations ( 9]10]11| , we get 

pf |R« < 0| < / + W py{ 11 / - < ||ns,(/) 

+ ^[too Vw||2l + Tc»W|| +M^W^2^i| + 

+tlins,(/) + [^oo + iM||/iu]| + 


+ ¥ 


11/ - /o|{ 2 (g^) + + iM{||/|U + ll/olloo}]^). 


( 11 ) 
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So, if exists J e such that 


(1 - D ll/-/0l&(G,) > (1 + i) linS,(/) -/lf2(G,) + T h- V^ + T^M|| +MX2^i] + 

+ [^Too + 2M(||/|U + ^ll/oiu)] ^ + r„j(u«), 

then 

P^"jRa < O) < iS, 

and this complete the proof of Theorem |2.1| 

4.2 Proof of Corollary |2.2| 

We will split the proof of Corollary |2.2| in two parts: in the first one we will bound r„j{ua), the 1 - 
quantile under the null hypothesis of the test statistic R„j; whereas in the second one, we shall use this 
bound together with Theorem |2.1| to complete the proof. 

4.2.1 • Upper bound for r„ ](««), J e 

In this section we will prove the following lemma: 

Lemma 4.3 Under the hypotheses and using the notation of Corollary \2.2\ there exists a positive constant C{a), 
such that 

rn,]{Ua) < 

where 


rn,]{a) ^ |to,oc 2J/^ .^loglog(n) +2 [to,oc + ^MH/olU] loglog(n) 

with To,oc = WfoWlo + Ik^lU- 

Proof First of all notice that, by hypothesis, and for all n e N, 

Jn = {o,. . .,log 2 |[ioglog(n)]3| } Card(J'„) = 1 +log 2 {[j^ 3 gj"g(„)p} ^ 1 +log 2 (”^)- 
Hence, under the null "/ = /o", and for a„ = a/[\ + log 2 (n^)], we get 

jsup jR„j - r„j(a„)} > 0 

Consequently, 

Un < sup |w e (0,1) : P®” 

Hence, all we have to do is to find an upper bound for 


sup{R„j 

teJn 


rn,]{u)] > 0 


<a} = Un 


^n,]{Ua) ^ 






a 


Te.T,. 


Te.T„ 


[l+log2(n2)] 


< a. 
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Working under the null, from Equation Q we obtain 

R„j = Un,^ + 2 (P„ - P) (ns, (/o) - /o) - IPs, (/o) - /oil 


2 

L"(Gx)‘ 


At this point, once we set = log(2/ a„) and n = u„^\ + log(5.6), we can proceed as in the proof of 

Theorem |2 .1 1 obtaining the following bounds 


By Lemma |4.2} 




I - “n 11^^ 

^o,oo ^0,ooUyi w A/I 

\ ' n 


^ Y' 


By Lemma 4.1 and using the inequality lab < + b^. 


p" {2 (p„ - P) (ns, (/„) - /„) - ims, (A) - 
Combining these two inequalities we get 


> 2 


T0,oo + g-^ll/ollo 


^n,l 

n 






n,j ^ T^o,ooT'ro^ooWn,!! 4“ AT + 2 (to,oo T 3 AI II/oIIcmI Mf!,i I ^ ^n- 


Linally, it is easy to see that we can find two constants C'(a) and C"(a) such that u„j < C'(a) loglog(n) 
and w„ji < C"(a) loglog(n), therefore 


? To,oo 7^ + ro.ooivii +M^2>“-f +2 [to,„ + iM||/olU] ^ < 

< 5 + l’^oC"(a))ro,oeloglog(n) +{Ko[C"(a)]^}M^2 h'°g'°/”4 + {2C'(tt)} [to,oo + jMII/olU] loglog(«) 

^ |to,oo 2JG 7loglog(«) + To,oo loglog(n) +M^2 h'°g'°/”)1 + [tq.oo + 5MII/0II00]loglog(jj)| = 

= ^ |to,oo 2J''2 ^\og\og{n) + 2 [to,„ + ^M||/o||oo] loglog(n) + M^2J |, 

where C(a) = maxjKo -\jC'' (a), kqC" (a), kq[C" (a)]^, 2C' (a)|. And this complete the proof. 


□ 


4.2.2 • Separation rates 


Combining Theorem |2 .1 1 and Lemma 4.3 for each fl e (0,1) we get that 


{Ra < 0} < (3, 


for every /(•) such that 

ll/-Alt,c,, > (i + K) ,g(,{ll/-n%(/)t(Gx) +A..J(«) + v»,i(«). 
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Now, assuming that / € R^{R,M, Gx), the right hand side in the last equation reduces to 


inf Ir^I Jloglog(n) + [to,oc + gM||/olu] loglog(n) + + 

I n V 

+aiT„2i« + f2Ji + a| = 


= inf ^R^2 + CiToo^^^ + C(a)To,o 


V2Jloglog(n) A/r22l-l 


+ CiM^|^+C(a)M 


2 2f[loglog(n)]2 


+2C{a) |^To,oo + jMII/ollc 


loglog(n) Crl 




W |i?22-2J^ + [CiToo + C(a)T0,c«] + [CiM^ + C(a)M2] ^Miogiog(n)]^ 

+ [2C(a)(To,oo + 5MII/0IU) + C2] ® 

where the last inequality denoted by (❖) comes from the fact that, for n > 16,1 < loglog(n). 
Now, since by hypothesis 2^ < n^/ [log log(n)]^, we have 


2f-[loglog()!)]2 


loglog(n) ^ 


V2J-loglog(n) n loglog(jj) 


n M [loglog(n)]3 


loglog(n) = 


V2J-loglog(n) 


SO that 


f{ll/-ns,{/)iy<;^)+F„j(a)+V„j(/J)) 


inf 

JeJ", 


< C'inf ii?22-2Js 

JeJ'n 




^T /2 Vloglog(^) ] ^ ^„ loglog(n 


where C' = 2 • max{l,CiToo + C(a)To,oo,M^[Ci + C(a)]} and C" = [2C(a)(To,oo + 5 MII/ 0 II 00 ) + C 2 ]. 

From this point on, the proof continues as in [(22 | and it will reported here just for the sake of 
completeness. First of all notice that 


R 22 - 2 Js ^ 2^/2 Vlog^ogW ^ 2 J > 


jnRf 

loglog(n) 


1 

l+4s 


So define J* by 


J* 



{nR? 

loglog(«) 


1 

l+4s 


+ 1 . 


Then we distinguish the following three cases: 
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1. In this case we work under the hypothesis that J* € This means that 


J* < Jn - log 2 { [loglogfniia}' 


and that 


Now notice that 


inf iR^2- 

]ej, 


2Ts I 2T/2 yioglogW I ^ p^22-2T*s I 2T*/2 Vl°glog ('0 


4s 

4s+l 


R22-2rs^R^ Vloglog(") 


, 2JV2 Vlogtog(^) ^ ^2 
So we can write 


inf R 22 - 2 h + 2^/2 

JeJ-,, 


nR 

4s+l y/log\og{n) ^ ^ ^ 

yioglog(n) 

loglog(n) 


n 


4s 

4s+l 




^Jlog\og{n) 


45 

4s+l 


2. In this second case, we assume that J* > J„, hence, by definition of J*, for all J € JJn 

2T/2 Vloglog('^) ^ r22-2Js 


Consequently we get 


inf R22-2h + 2J/2. 

JeJ-,, 


Vloglog(w) ] ^ ^22-2 J„s ^ 22s+1r 2 |iloglog(n)j 


■ 


2s 


3. In this last case, we assume J* < 0. Under this hypothesis, by definition of J*, 

2^22-2Js ^ 2T/^ Vlogl og(») 

Taking J = 0, we get 




inf iR22-2h +2^/2. 
]eJn 


yioglog(n) \ yioglog(n) 


And this complete the proof of Corollary |2.2| 
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4.3 Proof of Lemma 14.21 

We can prove Lemma [4!^ by either using Theorem 3.4 in Il38l . or Theorem 3.3 in |29l. From these results 
we know that exists some absolute constant C > 0 such that, for all u > 0 , 


p^n 



c 

n{n - 1 ) 


A21/ T A31/2 -|“ ^ 4 ti 



<5.6e““. 


where 


. A2 = n(n-l)E[gf(Zi,Z2)], 


f 


- 


n 


n 

] 

• A 2 = sup < 

E 

L^j(Zi,Z 2 )fl,(Zi)&y(Z 2 ) 

li*j J 

: E 

L «^(Zi) 

J=1 

< 1 and E 

I &J(Z 2 ) 

[M ’ 



• A^ = nsupjE[g2(z,Z2)]), 

Z 


• A 4 = SUp|gj(zi,Z 2 )|, 

Zl,Z 2 

and, from Equation Q, 

^J) 

^j(zi,Z2) = Yj [yi(p],kiGx{xi)) - %) • |y2(/)p(Gx(x2)) - %), 

k=l 


With = (/,c/)p(Gx)>L2(c ) = E [YMGxiX ))]. 


(Gx) 

In the following we will bound separately each of these four terms using some specific properties of 
the warped wavelet basis introduced in Section 2.1 In this section, (/)(•) is the compactly supported scaling 
function used to generate our basis. If supp((/)) c [0, L] then, for any k and j in Z we put 


IjT = 


[|,^] and = 


so that SLipp((/)j )t) c Tj jt. Notice that 


\ki /C 2 I > L => Gi Tj — 0/ 


and 


supp((/)p O Gx) C Gx^(Tp) = Ip with 1 |G (v) = 1<^ 1 jj^(Gx(x)) = 


= 1 . 


4.3.1 A bound for Ai 

Since 

^f(zi,Z2) = {yi(pik{G{xi)) - eik}{yi(pik'{G{xi)) - 6ik'}{y2(p],k(G{x2)) - 6ik]{y2(l)ik'{G{x2)) - Oik'}, 

k,k' 
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by the independence and identical distribution of the sample {Zf},gji^ we have 


E [gf{z^,Z 2 )] = Y, |E(yc/)p(G(x)) - %)(yc/)p,(G(x)) - . 

k,k' 

Now 

E(y(/)J,,(G(X)) - ep)(y(/)p.(G(x)) - %,) = 

= E [(/(X) + £)c/)p(G(X)) - %] • [(/(X) + £)c/)p,(G(X)) - dp,] = 

= E [/2(X)c/)p(G(X))c/)p,(G(X))] - ep,E[/(X)c/)p(G(X))] - epE[/(X)c/)p,(G(X))] + 
EdpSp, +E[(/)p(G(X))(/)p,(G(X))E(£2|X)] = 

= E ([/2(X) + a2(X)](/)p(G(X))(/)p,(G(X))) - dpdp. 

Hence, defining t(x) = f^{x) + a^(x) and using the inequality {a - b)^ < 2{a^ + we get 

E[g2(Zi,Z2)] = J^(E|T(X)c/)p(G(X))c/)p(G(X)))-ep0p,f < 
k,k' 

< 2^(E[T(X)c/)p(G(X))c/)p,(G(X))]f+ 2 (%%')' < 

U' k,k' 

< 2^jE[T(X)c/)p(G(X))c/)p,(G(X))]f+ 2[j^e2 \ _ 

k,k' '' k ' 

At this point we proceed bounding separately the two terms in the previous equation. 

• Let 

^k,k' — Ip ^ Ip' ^ '^k,k’ Ip and ffp' Ip// 

and 

J^{k) = [ieX-. Tp nTp ^ 0 ) = |^ e Z : < l). 
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with card(=y(A:)) = 2L + 1. Hence 

jE[T(X)(/)p(G(X))(/)p,(G(X))]f < [2J||(/)||Lf 2 ^(ie[t(X) 14 ,,(X)] 


k,k' 


k,k’ 






[2^\<P\\i] Y, E [t(X)14„ (X)] E [t(X)14,, (X)] < 

k,k' 

[2'll<f.|lL]'}2 E |'t(X)1|G (X)| E [t(X)1|G^^ (X)| < 

k,k’ ' 

[t~ 2 ’II<?'IIL]" Yj ® [’‘v(Gx(X))] = 




r It (x)dx y r It 


(z)dx 


< 




[too 2 J||(/)||^] 


2L(2L + 1) 


Y\ [ %i]Wdxl<2J[T»||(/)||ifL2(2L + l). (12) 

k lAfc I 


where the last inequality comes from the fact that for any function w e L^(E) 




w(x)dx. 


We have the following two bounds 


1 . E 0 f,»<EEeJ,= 


2 _llfl|2 






J k 

2. By using again the inequality we just mentioned, we obtain 

Tj^ik = 

k k k 

< 2Uml\\f\\^y\ f /(G-X^))d4 ^lUmlWfWooL f f(G-^(x))d. 

k lA^ j d[0,ll 

< 2U\f\\i\miL. 

Combining these two inequalities we can write 

Y oik] < 2J||/||i||(/)||LL < ihimtL. 

k ' ' 

Finally, from Equations ( [l^ and (131, we obtain 

Aj ^ n{n - 1) 2hl,C[{(l)) =» Ai <nCi((/)) ^J2hl,. 
where C^((p) = 2\\(p\\l,L{l + \\(p\\l,L{2L + 1)}. 




(13) 


(14) 
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4.3.2 A bound for A 2 

Let(pj,)c(z) = i/(/)j^fc(Gx(x)),then 

E (gj(Zi,Z2)fli(Zi)&^(Z2)) = X |e [fl,(Zi) ((pp(Zi) - %,)])|E[by(Z2) (cpj,kiZi) - 9j,k 

k 

= Yj jE[«i(Z)m(Z)] • E[by(Z)(pp(Z)] - %E[fli(Z)(pj,,(Z)] • E[by(Z)] + 


-epE[by(Z)(pp(Z)] -Elfl^^Z)] + ej",^E[fl,(Z)]E[b^(Z)]|= 

[x E[fl,-(Z)(pp(Z)] • E[by(Z)(pp(Z)]| + |e [ai{Z)] E[bj{Z)] ^ d\ 


2 I 
PI 


-iE [fli(Z)] -E 




E[&y(Z)]-E 


i{z)YdiknkC^ 


= (') + l")-|"')-('v)- 

Notice that 

Y %<pp(z) = Y = y| %c/)p(x)l = yns,(/)(x). 

k I fc j 

Next, we will bound separately each of the four terms in the previous Equation. 

1. By applying the Cauchy-Schwarz inequality twice we get 




< 


I^E[flKZ)(pp(Z)%(,)(Z)] I |^E[&y(Z)(pp(Z)l^(,)(Z 
\Y E [flf(Z)l^(,)(Z)]E [(pf,(Z)]| [y^ [b^Z)l^^,){Z)]E[cpl{Z 


K k ) K k ) 

where ^{k) = |z = {x,y) e (0,1) x [-M,M] : x e and y e [-M,M]|. Now we have 
• By usual arguments 

\(pI{Z)] = E[y2c/)2,(X)]=E[T(X)(/)2,(X)]<T„2J||c/)||2„ JljjGx(x))dGx(x) 
< Too 2 ^\\(p\\l,^ = Too\\(P\\Ll. 


E 




L E [fl2(Z)l^(,) (Z)] < E [fl 2 (z)] L E (Z)] < LE [a^{Z)]. 
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LE 

k 


& 2 (Z)]LE[l^(,)(Z)]<LE[b 2 (Z) 


[& 2 (z)%(,)(Z)J<E 
Consequently we get 

I I I < [z^WcpitL^E [fl^(Z)]p [z^WcpWlL^E [bj{Z)Y = Too||c/)||Li^" 7®[«^(Z)] ^E[&2(z)]. 


And finally 


n i-1 


ELi'i « 

1=1 ;=1 


t„|c/)|Ll2 ^ ^E[a^iZ)] • ^E[&2(Z)] 

1=1 ;=1 




2 r2 


< n Too||(/)|looi'' 






/=1 


A 


£E[t 2 (z)] 

7=1 


< n Too||(/)||^l 2 . 


2. By definition of and bj{-), we obtain 


n i-1 

LL 

7=1 ;=1 


^ LE(LA)h“?<z)i'E|i^(z)i 

7=1 ;=1 '' k ' 


< n I 


a/e E, ^7 (2) 


< 


< n I 


,^E[fl 2 (z)] ^E[fl 2 (z)] 


< 


7=1 


7=1 


3. By Cauchy-Schwarz inequality we have 

n 7-1 n 7-1 

EL.. EE |E[flKz)]-E[by(z)yns,(/)(x)]|< 

7=1 ;=1 7=1 ;=1 

< ^|^jE[fl2(z)]p|E[&2(z)])^|E[yns,(/)(x)]2p < 

7=1 /=! 

< n(E[y2n|(/)(X)])^ ^E^,fl2(z) ^E J^,&2(Z) <n|E[y2n|(/)(X)])^ 

< nE(T(X) n|(/)(X)p <n ^t.oE[/ 2(X)] <n ^Tooll/llio 

4. Proceeding as in the previous point, we get 

n 7-1 n 7-1 

EEi'vnEE |E[&y(Z)].E[fl,(Z)yns,(/)(X)]|<nT„. 

7=1 ;=1 7=1 ;=1 

Combining all the previous inequalities, we can write 

A 2 < [n ZooWcpWloL^ + 3n Too} = C2{(p) n Too, 


where C 2 ((/)) = ||(/)||^L2 + 3. 



4.3.3 A bound for A 3 

Lets start writing 


E[gf{z,Z2) = 


< 


Yj {y^ik{G{x)) - ej^k]{y(p],k'{G{x)) - eik']x 
k,k' 

xE[Y2<Pi,k{G{X2)) - %)E|y2(/)pKG(X2)) - ep) = 
2^ {•}{•} (e [T(x)c/)p(G(x))c/)p,(G(x))] - epep,) = 

k,k’ 


^ { • }{ • }E [ • ] - ^ ep(y(/)p(G(x)) - ep) 

k,k' L J. 




Y (#p(G(x)) - ep)|#p.(G(x)) - ep)E [T(X)c/)p(G(X))c/)p,(G(X))]. 

k,k' 


Now we have 

• By the same arguments used in bounding Ai, we get 


E[T(X)c/)p(G(X))c/)p,(G(X)); 


< To 


= To 



\,nTp,(Gx(x))dGx(x) 



It 




Since 


Y ~ %)|#P'(G(x)) - 0p4 = I(#p(G(x)) - dp) 

k,k' 


L k 

we need to bound separately |Ljc (^],k\' | y4’],kiG{x)) \ as follow: 

x,y 

1. sup|i:;t#p(GW)| <M 2 J/ 2 ||(^||^( 2 L + 1 ), 

x,y 

2. \Lkdik\ = |lE{/(X)E,Pp,(G(X))}| < ]E[/(X)] ||L,<^.p(G)l|oo < ||/||.o2J/2||P||^(2L + 1). 

Consequently 




sup 

x,y 


sup 

x,y 


-|2 


■ 


< 

sup 

- 


x,y 


.1 

2 


- dp} < pup i/(/)p(G(x)) - dp 
I I \ V^.y 


< 


y^#p(G(x)) + y^dp < [(M + ||/|U)2J/2||c/)|U(2L + 1)] . 
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Hence, finally 


supEjg2(z,Z2)) 


< Too 2^||(/)||^;^ sup 


r 1 ^ 

{y(/)j,fc(G(z)) - %} 

- k 

< Too(M + WfW^fimcpWiL [\\(P\U2L + l)]^ 




so that 

A 3 < Csicp) (M + ll/IU) V'^Too2J < 2C^{(p)M ^nToc2\ 
with C3 ((/)) = ||(/)||^ Vl(2L + 1). 


4.3.4 A bound for A 4 

Bearing in mind the following inequalities 

• sup|L;t#J,/c(G(^)) -%| < I|(/)|U(2L + 1)(M + ||/|U)2JG, 

x,y 

• sup|y(/)p(G(x))| <M2h2||(^||^^ 
x,y 

. |%| < E|/(X)(/)p(G(X))| < ||/|U2 JG||c^||^, 
we end up with the following bound 

A 4 ^ sup|^j(zi,Z 2 )| < C4((/))2J(M+ll/IU)^ 

Zi,Z2 

where C 4 {(p) = \\(p\\l,{2L + 1). 


( 16 ) 


4.3.5 Conclusion 

Up to now we have found that, for each w > 0, Pj” {|U„j| > f]j(M)| < 5.6e““, with 

C [Ci((/))nTc« 1 C2{(p)nT^ 2C3{(p)Myfnr^ 1 C4((/))M^2J 

m(M) =-- < -m2 -I- u H-m2m h- 1 

n-1 \ n n n n 

By applying the inequality 2ab < + b^, we have 


C3{<P)\2 


yjzooU 


^^M |<C3((/))|t»M+M22J^), 


so 


= ;^{Ci((/))t»V^m2 + [C2(c/))+C3((/))] T.oM+[C3((/)) + C4((/))]M22^y^^f} 




^ <1 Too V^ +Tcx,M 


n-1 


n 


where Ko = C max |Ci ((/)), C 2 ((/)) + C 3 ((/)), C 3 ((/)) +C 4 ((/))|. And this complete the proof. 


□ 
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